Scaling machine learning using approximate counting

ABSTRACT

A system may track statistics for a number of features using an approximate counting technique by: subjecting each feature to multiple, different hash functions to generate multiple, different hash values, where each of the hash values may identify a particular location in a memory, and storing statistics for each feature at the particular locations identified by the hash values. The system may generate rules for a model based on the tracked statistics.

RELATED APPLICATION

This application is a continuation of U.S. application Ser. No.11/749,588 filed May 16, 2007, the entire disclosure of which isincorporated herein by reference.

BACKGROUND

Machine learning often refers to the design and development ofalgorithms and techniques that allow computers to learn. The major focusof machine learning research is to extract information from dataautomatically by computational and statistical methods.

SUMMARY

According to one aspect, a method may include identifying a feature of anumber of features in a repository; performing a number of differenthash functions on the feature to generate a corresponding number ofdifferent hash values; identifying buckets of a group of buckets in amemory based on the hash values; reading values from the identifiedbuckets; updating the values; writing the updated values into theidentified buckets; and generating rules for a model based on the valuesin the group of buckets.

According to another aspect, a device may include a memory and aprocessor. The memory may store statistics regarding a group of featuresin buckets. The processor may identify a feature of the group offeatures, subject the feature to a number of different hash functions togenerate a number of different hash values, where the number of hashfunctions includes at least three different hash functions. Theprocessor may identify a number of the buckets in the memory based onthe number of different hash values, read the statistics from theidentified buckets, update the statistics, and write the updatedstatistics into the identified buckets. The processor may also generaterules for a model based on the statistics in the buckets.

According to yet another aspect, a method may include identifying afeature of a set of features in a repository; performing a number ofdifferent hash functions on the feature to generate a correspondingnumber of different hash values; identifying a number of buckets in amemory based on the number of hash values; reading values from theidentified buckets; determining a single value from the values; andgenerating rules for a model based on the single values for a group ofthe features in the repository.

According to a further aspect, a device may include a memory and aprocessor. The memory may store statistics regarding a set of featuresin buckets. The processor may identify a feature of the set of features,subject the feature to a number of different hash functions to generatea corresponding number of different hash values, identify a number ofthe buckets in the memory based on the number of hash values, read thestatistics from the identified buckets, determine a single value fromthe statistics, and generate rules for a model based on the singlevalues for a group of the features.

According to another aspect, a system implemented within one or moredevices is provided. The system may include means for trackingstatistics for a group of features using an approximate countingtechnique including: means for subjecting each feature of the group offeatures to multiple, different hash functions to generate multiple,different hash values, each of the hash values identifying a particularlocation in a memory, and means for storing statistics for each featureof the group of features at the particular locations identified by thehash values. The system may also include means for generating rules fora model based on the tracked statistics.

According to yet another aspect, a method may include trackingstatistics for a group of features using an approximate countingtechnique including: tracking statistics for a subset of the features ina memory, identifying a new feature, identifying one of the features inthe subset of the features with particular statistics, and replacing thestatistics for the identified one of the features with statistics forthe new feature in the memory; and generating rules for a model based onthe tracked statistics in the memory.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of this specification, illustrate one or more embodiments and,together with the description, explain these embodiments. In thedrawings:

FIG. 1 is a diagram of an overview of an exemplary implementationdescribed herein;

FIG. 2 is a diagram of an exemplary approximate counting system in whichsystems and methods described herein may be implemented;

FIG. 3 is a diagram of an exemplary node of FIG. 2;

FIG. 4 is a functional block diagram of an exemplary approximatecounting system;

FIG. 5 is a functional block diagram of an exemplary configuration ofthe memory of FIG. 4;

FIGS. 6A-6C illustrate a flowchart of an exemplary process forperforming approximate counting;

FIG. 7 is a functional block diagram illustrating an exemplary writefunction; and

FIG. 8 is a functional block diagram illustrating an exemplary readfunction.

DETAILED DESCRIPTION

The following detailed description refers to the accompanying drawings.The same reference numbers in different drawings may identify the sameor similar elements. Also, the following detailed description does notlimit the invention.

Overview

FIG. 1 is a diagram of an overview of an exemplary implementationdescribed herein. As shown in FIG. 1, a repository may be formed from alarge set of labeled data (e.g., over one million data elements). In oneimplementation, the labeled data may include data obtained from a serverlog. For example, the repository may include e-mail data, advertisementdata, and/or other data indicative of user behavior. In oneimplementation, the data in the repository may be obtained frommonitoring user behavior (e.g., the e-mails that users sent and/orreceived, and/or the advertisements presented to the users and/orselected by the users). User behavior may be monitored with the users'consent. In another implementation, the labeled data may include anytype or form of labeled data.

The data in the repository may be used to create rules for a model. Inone exemplary implementation, the data may include e-mail data, such asspam and regular (non-spam) e-mail, that may be used to create rules fora model that may predict whether future emails are spam. In anotherexemplary implementation, the data may include advertisement data, suchas advertisements, Uniform Resource Locators (URLs), and/or userinformation, that may be used to create rules for a model that maypredict whether a user will select a particular advertisement. In otherexemplary implementations, other types or a combination of types of datamay be used.

As shown in FIG. 1, certain statistics (e.g., count values, weightvalues, or other forms of statistics, such as mean values, standarddeviation values, etc.) may be maintained regarding the data in therepository. Implementations described herein may use an approximatecounting technique to maintain the statistics. As used herein,“approximate counting” is to be broadly interpreted to include somethingless precise than exact counting.

The implementations described herein may permit machine learningtechniques to be used on a very large data set (e.g., a data setincluding over one million data elements). By using approximatecounting, versus exact counting, a fraction of the number of devices andthe amount of memory needed to track certain statistics regarding thelarge data set are required than had previously been possible. Theimplementations described herein are also scaleable in the sense that asthe size of the data set grows, additional devices and/or additionalmemory can be added, as needed, without significantly impacting theapproximate counting process.

Suitable Approximate Counting System

FIG. 2 is a diagram of an exemplary approximate counting system 200suitable for use with systems and methods described herein. System 200may include nodes 210-1 through 210-J (where J≧1) (collectively referredto as nodes 210) optionally connected to a repository 220 via a network230. Network 230 may include a local area network (LAN), a wide areanetwork (WAN), a metropolitan area network (MAN), a telephone network,such as the Public Switched Telephone Network (PSTN) or a cellularnetwork, an intranet, the Internet, or a combination of networks.

Repository 220 may include one or more logical or physical memorydevices that may store a large data set (e.g., potentially over onemillion elements) that may be used to create and train a model.

The set of data in repository 220 may include multiple elements “d,”called instances. An example of an instance may include an e-mail or anadvertisement. Repository 220 may store more than one million instances.Each instance d may include a set of features “X” and a label “Y.” Thelabel Y may be a value (e.g., “spam” or “non-spam”), which may be calledy₀ and y₁.

A feature X may be an aspect of an instance that may be useful todetermine the label (e.g., “the number of exclamation points in themessage” or “whether the word ‘free’ appears in the message”).Repository 220 may store more than one-hundred thousand distinctfeatures. As used herein, the term “feature” is intended to refer to asingle feature or a combination of multiple features.

Each feature may have a feature name associated with it. For example, afeature X_(i) may include the name “feature_i” or “drugs.” To assureuniqueness, the feature name may be appended with the feature number(e.g., “Viagra4,” “cheap3,” or “drugs8”) or some other value (e.g., arandom number). Each feature X may also include a feature value. In oneimplementation, a feature X_(i) may include a Boolean value (e.g., avalue of “0” or “1” based on whether the word “free” appears in themessage). In another implementation, a feature X_(i) may include adiscrete value (e.g., a value based on the number of exclamation pointsin the message). In yet another implementation, a feature X_(i) mayinclude a continuous value (e.g., a value generated as some function ofthe number of exclamation points in the message or as a function ofwhere “free” appears in the message). An instance d may be written as:d=(x₁, x₂, x₃, . . . , x_(t), y), where x_(i) is the value of the i-thfeature X_(i) and y is the value of the label.

Nodes 210 may include entities. An entity may be defined as a device,such as a computer device, a personal digital assistant (PDA), a laptop, or another type of computation or communication device, a thread orprocess running on one of these devices, and/or an object executable byone of these devices.

Each of nodes 210 may be responsible for all or a portion of theinstances. In one implementation, node 210 may obtain its instances fromrepository 220 when needed. In another implementation, each of nodes 210may optionally store a copy of its instances in a local memory 215. Inthis case, node 210 may retrieve its copy from repository 220 and storethe copy in local memory 215. In yet another implementation, each ofnodes 210 may store its instances in local memory 215 and system 200 mayinclude no repository 220.

FIG. 3 is a diagram of an exemplary single node 210. Node 210 mayinclude a bus 310, a processor 320, a main memory 330, a read onlymemory (ROM) 340, a storage device 350, an input device 360, an outputdevice 370, and a communication interface 380. Bus 310 may include apath that permits communication among the components of node 210.

Processor 320 may include any type of processor, microprocessor, orprocessing logic that may interpret and execute instructions. Mainmemory 330 may include a random access memory (RAM) or another type ofdynamic storage device that may store information and/or instructionsfor execution by processor 320. ROM 340 may include a ROM device oranother type of static storage device that may store static informationand/or instructions for use by processor 320. Storage device 350 mayinclude a magnetic and/or optical recording medium and its correspondingdrive or a removable memory device.

Input device 360 may include a mechanism that permits an operator toinput information to node 210, such as a keyboard, a mouse, a pen, voicerecognition and/or biometric mechanisms, etc. Output device 370 mayinclude a mechanism that outputs information to the operator, includinga display, a printer, a speaker, etc. Communication interface 380 mayinclude any transceiver-like mechanism that enables node 210 tocommunicate with other devices and/or systems.

As will be described in detail below, node 210 may perform certainapproximate counting-related operations. Node 210 may perform theseoperations in response to processor 320 executing software instructionscontained in a computer-readable medium, such as memory 330. Acomputer-readable medium may be defined as one or more physical orlogical memory devices and/or carrier waves.

The software instructions may be read into memory 330 from anothercomputer-readable medium, such as data storage device 350, or fromanother device via communication interface 380. The softwareinstructions contained in memory 330 causes processor 320 to performprocesses that will be described later. Alternatively, hardwiredcircuitry may be used in place of or in combination with softwareinstructions to implement processes described herein. Thus,implementations described herein are not limited to any specificcombination of hardware circuitry and software.

FIG. 4 is a functional block diagram of an exemplary approximatecounting system 400. Approximate counting system 400 may be implementedas hardware and/or software within one or more of nodes 210.

Approximate counting system 400 may track statistics on all or a subsetof the features in repository 220. As shown in FIG. 4, approximatecounting system 400 may perform a number of different hash functions 1,2, . . . , N (where N>1) on a feature name. The feature name for afeature may include a unique string. For example, in the e-mail context,a feature name may include “Viagra,” “cheap3,” or “drugs.”

The feature name may be hashed by multiple, different hash functions togenerate multiple, different hash values. Each of the hash functions mayinclude any type of hash function as long as the different hashfunctions produce different hash values based on the same input. Byusing multiple hash functions, collisions can be reduced. For acollision to occur between two different feature names, each of the Nhash functions would need to hash both feature names to the same hashvalues. In the case of a combination of features, the i-th hash valuemight be the sum of the i-th hash values for each of the individualfeatures that form the combination. In another implementation, the i-thhash values for each of the individual features that form thecombination may be combined in another manner.

As shown in FIG. 4, the feature name may be hashed by hash function 1 toproduce a hash value h_1. The feature name may be hashed by hashfunction 2 to produce a hash value h_2. Similarly, the feature name maybe hashed by hash function N to produce a hash value h_N. Assume thathash values h_1, h_2, . . . , h_N include values that range from 0 toM−1. Hash values h_1, h_2, . . . , h_N may correspond to addresses in amemory.

The memory may, thus, include M memory locations (referred to herein as“buckets”), where M is a number less than the number of features. FIG. 5is a functional block diagram of an exemplary configuration of thememory. As shown in FIG. 5, the memory may include M buckets (i.e.,distinct memory locations) (shown in FIG. 5 as buckets 0 to M−1). Hashvalues h_1, h_2, . . . , h_N may identify different ones of thesebuckets. For example, as shown in FIG. 5, hash function 1 may hash thefeature name to a hash value of h_1; hash function 2 may hash thefeature name to a hash value of h_2; and hash function N may hash thefeature name to a hash value of h_N. A read operation or a writeoperation may be performed on the statistics in the identified buckets.

Exemplary Process

FIGS. 6A-6C illustrate a flowchart of an exemplary process forperforming approximate counting. The process may be performed by asingle node 210 or a combination of multiple nodes 210. The process mayuse an approximate counting technique to track statistics regardingfeatures to facilitate generation of rules for a model (e.g., a modelthat predicts whether future e-mails are spam, a model that predictswhether a user will select a particular advertisement, etc.).

During the model generation process, it may be beneficial to trackstatistics regarding features in repository 220. The statistics for aparticular feature might include, for example, a feature count (e.g.,the number of instances in repository 220 that include the particularfeature) and/or a weight associated with the particular feature. Thestatistics might be used to identify those features that might be usefulin forming rules for the model. For example, those features whosestatistics indicate that the features occur more than a threshold numberof times in different instances and/or that have weight values that aresignificantly far from zero might be included in a special group forwhich their statistics and/or weights might be tracked more exactly. Thefeatures in the special group might still be quite large (e.g., over onethousand features). The goal of the model generation process may be touse the statistics of the special group and/or the set of approximatestatistics to induce a function f such that given a feature vector x,the output f(x) may be a good prediction of the true label y.

The process may begin with a feature being identified (block 605). Forexample, node 210 may select a feature that appears in at least oneinstance. Node 210 may process all or a subset of the features inrepository 210. At any particular time, node 210 may process one or morefeatures to perform some operation (e.g., a write operation or a readoperation). For example, node 210 may need to update or read a valueassociated with the feature (e.g., a count value or a weight value).

Multiple hash values may be generated for the feature by subjecting thefeature to multiple, different hash functions (block 610). For example,as shown in FIG. 4, the feature name may be subjected to hash functions1, 2, . . . , N to generate hash values h_1, h_2, . . . , h_N.

Buckets corresponding to the hash values may be identified (block 615).For example, hash value h_1 may correspond to a first address, hashvalue h_2 may correspond to a second address, . . . , and hash value h_Nmay correspond to an Nth address. These addresses may identifyparticular buckets of the group of buckets in the memory. As shown inFIG. 5, for example, assume that the first address identifies bucketh_1, the second address identifies bucket h_2, and the Nth addressidentifies bucket h_N.

Values may be read from the identified buckets (block 620). For example,each of the buckets may store statistics (e.g., a count value and/or aweight value). An operation may be performed on the N values read fromthe N identified buckets. The operation may include a write operation ora read operation. For example, node 210 may need to increment a countvalue associated with the feature and, therefore, may perform a writeoperation. Alternatively, node 210 may need to simply determine thecount value associated with the feature and, therefore, may perform aread operation.

If a write operation is performed (block 625—WRITE), the N values may besubjected to a write function (block 630) (FIG. 6B) that produces Nresultant values. The N resultant values may be written to the Nidentified buckets (block 635).

For example, assume that the write function involves incrementing countvalues stored in the identified buckets by a particular amount. As shownin FIG. 7, assume that the count values include the following: 100(bucket h_1), 105 (bucket h_2), and 5000 (bucket h_N). The difference inthese values may be due to collisions that occurred when hashing featurenames.

The write function may take N values and increment them in some manner.The particular manner in which the N values are incremented may dependon performance considerations. In one exemplary implementation, thewrite function may determine the minimum count value (e.g., 100),increment the minimum count value by one (e.g., 100+1=101), and writethe incremented value (e.g., 101) to each of the identified buckets. Inanother exemplary implementation, the write function may increment eachof the count values by one (e.g., 100+1=101; 105+1=106; 5000+1=5001),and write the incremented values (101, 106, 5001) to the correspondingones of the identified buckets. In yet another exemplary implementation,the write function may take the mean or median count value, incrementthat mean or median count value by one, and store the incremented valuein each of the identified buckets. In a further implementation, thewrite function may increment the count values in another way.

If a read operation is performed (block 625—READ), the N values may besubjected to a read function to identify a read value (block 640) (FIG.6C), which may then be output (block 645). For example, assume that theread function involves determining a single value from the count valuesstored in the identified buckets. As shown in FIG. 8, assume that thecount values include the following: 100 (bucket h_1), 105 (bucket h_2),and 5000 (bucket h_N). The difference in these values may be due tocollisions that occurred when hashing feature names.

The read function may take the N values and determine a single(approximated) value for the feature from the N values in some manner.The particular manner in which a single value is determined from the Nvalues may depend on performance considerations. In one exemplaryimplementation, the read function may take one of the N count values(e.g., the minimum count value) as the single count value. In anotherexemplary implementation, the read function may determine the mean ormedian count value from the N count values and use this mean or mediancount value as the single count value. In yet another exemplaryimplementation, the read function may determine a single count valuefrom the N count values in another way.

The process of FIGS. 6A-6C may be repeated for other features in one ormore nodes 210. In one implementation, a single node 210 may be selectedto process a particular feature. For example, the feature name of theparticular feature may be subjected to a hash function to generate ahash result that may be used to select which node 210 is to process theparticular feature. The selected node 210 may then perform the processof FIGS. 6A-6C with regard to the particular feature.

The process of FIGS. 6A-6C may be used to identify those features thatappear useful in forming rules for a model. A feature might bedetermined to be useful if its count value is greater than a threshold(indicating that the feature has appeared in more than the thresholdnumber of instances in repository 220) and/or its weight value issignificantly far from zero. A useful feature may be promoted to aspecial group in which it may have its statistics exactly (or moreexactly) tracked. Features from the special group may then be used toform rules for a model.

While one approximate counting process has been described, otherapproximate counting processes may be used in other implementations. Forexample, rather than keeping track of all statistics, statistics may betracked for M features. In this case, the goal may be to keep track ofthe statistics from the M most important features, and ignore theothers.

This exemplary approximate counting process may initially keep track ofthe statistics for each new feature that is encountered in repository220. Eventually, the M available memory locations in the memory willfill up. If a new feature u is encountered, then a decision may be made:either feature u can be ignored, or feature u can replace one of the Mexisting features. Each of the existing M existing features may beexamined, and the feature with the least number of occurrences ininstances in repository 220 may be replaced. Alternatively, the featurewith a weight value closest to zero may be replaced, or some othercriteria may be used to identify which feature to replace. In oneimplementation, the feature with the least number of occurrences or witha weight value closest to zero of the M existing features may always bereplaced, or may be replaced based on some probability that perhapsdepends on its stored statistics (e.g., discard the feature as afunction of 1/(number of instances that contain that feature)).

Alternatively, another suitable counting technique may keep track ofinteger counts (e.g., the number of instances matching a feature). Ifthere is not enough memory to store the statistics for all of thefeatures exactly, the statistics may be represented in a reduced numberof bits (e.g., two bits may be used to represent the count, so themaximum count would be four). If a count for a feature exceeds four,then that feature may be promoted to the special group and have itsstatistics tracked exactly. This process is especially beneficial in thesituation where the majority of the features never exceed a count offour.

Yet another exemplary approximate counting process may use a combinationof techniques described above.

Exemplary Model Generation Process

To facilitate generation of the model, a machine learning algorithmbased on logistic regression may be used. One exemplary machine learningalgorithm may be based on logistic regression using gradient descent.This exemplary machine learning algorithm may perform feature selection.In one implementation, a “rule set” may be maintained. A “rule set” maybe defined as a set of functions that together can map a feature vectorto a predicted label. In this exemplary machine learning algorithm, the“rule set” may include a set of features together with a weight for eachfeature.

The machine learning algorithm may be performed on multiple devices,such as nodes 210 (FIG. 2) or devices separate from nodes 210. Each ofnodes 210 may track statistics for a subset of features.

Assume that a node 210 is given a new instance (x, y) to process. Node210 may broadcast a request to all nodes 210 requesting the statisticsfor some number of features that are present in the feature vector x.The requested features might be those that are in a rule set for themodel plus a set of other candidates (e.g., all single features and/orfeatures formed by extending existing rule-set features by one otherfeature present in the feature vector). The returned statistics mightcome from either the approximate counting pool, the special group (i.e.,those statistics that are exactly tracked), or both.

The statistics for a feature may include the weight for that feature.Node 210 may use the weight for each feature that is in the rule set tocompute the prediction:f(x)=Sum_(—) i_in_rule_set(feature_(—) i weight*feature_(—) i value),where “Sum_i_in_rule_set” may refer to the sum over all of the featuresin the rule set.

Node 210 may then compute an update for each feature (including thosefeatures that are not in the rule set):feature_(—) i weight=feature_(—) i weight+α*(y−f(x)−β)*feature_(—) ivalue,where α may include a small value (e.g., 0.01, or 1/(the number ofinstances that contain feature_i)).

If feature_i is not in the rule set, then:β=feature_(—) i weight*feature_(—) i value; otherwise β=0.

After the weights have been updated, some of the features may bepromoted to the rule set (e.g., if their weight is significantly awayfrom zero). The process may be repeated for subsequent instances

CONCLUSION

Implementations described herein may use an approximate countingtechnique to maintain statistics regarding a large set of data (e.g., adata set including more than one million instances and more than onehundred thousand features). A machine learning algorithm may use thestatistics to form rules for a model.

The foregoing description provides illustration and description, but isnot intended to be exhaustive or to limit the invention to the preciseform disclosed. Modifications and variations are possible in light ofthe above teachings or may be acquired from practice of the invention.

For example, while a series of blocks has been described with regard toFIGS. 6A-6C, the order of the blocks may be modified in otherimplementations. Also, non-dependent blocks may be performed inparallel. Further, the blocks may be modified in other ways. Forexample, in another exemplary implementation, the blocks of FIGS. 6A-6Cmay be performed in a loop for a number of features.

It will also be apparent that aspects, as described above, may beimplemented in many different forms of software, firmware, and hardwarein the implementations illustrated in the figures. The actual softwarecode or specialized control hardware used to implement these aspects isnot limiting of the invention. Thus, the operation and behavior of theaspects were described without reference to the specific softwarecode—it being understood that software and control hardware can bedesigned to implement these aspects.

No element, act, or instruction used in the present application shouldbe construed as critical or essential to the invention unless explicitlydescribed as such. Also, as used herein, the article “a” is intended toinclude one or more items. Where only one item is intended, the term“one” or similar language is used. Further, the phrase “based on” isintended to mean “based, at least in part, on” unless explicitly statedotherwise.

1. A method performed by one or more computer devices, comprising:storing, in a repository, information regarding a plurality of features;storing, in a plurality of memory locations in a memory, values relatingto the plurality of features; identifying a particular feature of theplurality of features in the repository; subjecting a string, associatedwith the particular feature, to multiple, different hash functions togenerate multiple, different hash values; identifying, for each of themultiple, different hash values, a respective memory location, of theplurality of memory locations in the memory; reading the values storedat the respective memory locations; performing an operation on the readvalues from the respective memory locations to obtain updated values;writing the updated values into the respective memory locations; andusing the values, including the updated values, to make a predictionregarding particular data.
 2. The method of claim 1, where performingthe operation on the read values includes incrementing each of the readvalues by a particular amount.
 3. The method of claim 1, whereperforming the operation on the read values includes: identifying aminimum value from the read values, updating the minimum value, andreplacing each of the read values with the updated minimum value.
 4. Themethod of claim 3, where writing the updated values into the respectivememory locations includes writing the updated minimum value into each ofthe respective memory locations.
 5. The method of claim 1, whereperforming the operation on the read values includes: determining a meanor median value from the read values, updating the mean or median value,and replacing each of the read values with the updated mean or medianvalue.
 6. The method of claim 5, where writing the updated values intothe respective memory locations includes writing the updated mean ormedian value into each of the respective memory locations.
 7. The methodof claim 1, further comprising: determining a single value from the readvalues; and replacing each of the read values with the single value. 8.The method of claim 7, where determining the single value includes:using one of the read values as the single value.
 9. The method of claim7, where determining the single value includes: determining a mean ormedian value from the read values, and using the mean or median value asthe single value.
 10. The method of claim 1, where using the values,including the updated values, to make the prediction includes generatingrules for a model based on the values including the updated values. 11.A method performed by one or more computer devices, comprising:processing, by one or more processors of the one or more computerdevices, each particular feature of a plurality of features in arepository, where processing each particular feature includes:performing, by one or more processors of the one or more computerdevices, a plurality of different hash functions on a string, associatedwith the particular feature, to generate a corresponding plurality ofdifferent hash values, identifying, by one or more processors of the oneor more computer devices, buckets, of a plurality of buckets in amemory, based on the plurality of different hash values, reading, by oneor more processors of the one or more computer devices, a statisticalvalue from each of the identified buckets, updating, by one or moreprocessors of the one or more computer devices, each of the statisticalvalues by subjecting each of the statistical values to a particularfunction to generate updated statistical values, and writing, by one ormore processors of the one or more computer devices, each of the updatedstatistical values into a corresponding one of the identified buckets;identifying a group of features, of the plurality of features, based onthe statistical values, including the updated statistical values, in theplurality of buckets; and using, by one or more processors of the one ormore computer devices, the identified group of features and thestatistical values associated with the identified group of features tomake a prediction regarding particular data.
 12. The method of claim 11,where the particular data includes particular e-mail data, and whereusing the identified group of features and the statistical valuesassociated with the identified group of features to make a predictionregarding particular data includes using the identified group offeatures and the statistical values associated with the identified groupof features to predict whether the particular e-mail data includes spam.13. The method of claim 11, where the particular data includesparticular advertisement data, and where using the identified group offeatures and the statistical values associated with the identified groupof features to make a prediction regarding particular data includesusing the identified group of features and the statistical valuesassociated with the identified group of features to predict whether theparticular advertisement data will be selected by a user.
 14. The methodof claim 11, where updating each of the statistical values includes:identifying a minimum value from the statistical values, updating theminimum value, and replacing each of the statistical values with theupdated minimum value.
 15. The method of claim 14, where writing each ofthe updated statistical values into the corresponding one of theidentified buckets includes writing the updated minimum value into eachof the identified buckets.
 16. The method of claim 11, where updatingeach of the statistical values includes: determining a mean or medianvalue from the statistical values, updating the mean or median value,and replacing each of the statistical values with the updated mean ormedian value.
 17. The method of claim 16, where writing each of theupdated statistical values into the corresponding one of the identifiedbuckets includes writing the updated mean or median value into each ofthe identified buckets.
 18. The method of claim 11, where updating eachof the statistical values includes replacing each of the statisticalvalues with one of the statistical values.
 19. The method of claim 11,further comprising: determining a mean or median value from thestatistical values, and replacing each of the statistical values withthe mean or median value.
 20. A system, comprising: one or more firstmemory devices to store information regarding a plurality of features;one or more second memory devices to store, in a plurality of memorylocations, statistical values relating to the plurality of features; andone or more computer devices to: identify a particular feature of theplurality of features in the one or more first memory devices, subject astring, associated with the particular feature, to multiple, differenthash functions to generate multiple, different hash values, identify,for each of the multiple, different hash values, a respective memorylocation, of the plurality of memory locations in the one or more secondmemory devices, read the statistical values stored in the respectivememory locations, perform an operation on the read statistical values toobtain updated statistical values, write the updated statistical valuesinto the respective memory locations, and use the statistical values,including the updated statistical values, to predict whether aparticular e-mail includes spam or to predict whether a particularadvertisement will be selected by a user.
 21. The system of claim 20,where, when performing the operation on the read statistical values, theone or more computer devices are to: identify a minimum value from theread statistical values, update the minimum value, and replace each ofthe read statistical values with the updated minimum value; and where,when writing the updated statistical values into the respective memorylocations, the one or more computer devices are to write the updatedminimum value into each of the respective memory locations.
 22. Thesystem of claim 20, where, when performing the operation on the readstatistical values, the one or more computer devices are to: determine amean or median value from the read statistical values, and update themean or median value; and where, when writing the updated statisticalvalues into the respective memory locations, the one or more computerdevices are to write the updated mean or median value into each of therespective memory locations.